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Block  Diagram  of  a  Perceptual  Audio  Encoder 


Source:  Brandenburg,  "Vorlesung:  Dig.  Audiosignalverarbeitung” 
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Information  Processing  in  the  Auditory  System 


•  basilar  membrane  as  a  filter  bank 


•  bank  of  highly  overlapping  bandpass  filters 

•  the  magnitude  responses  are  asymmetric  and  nonlinear  (level 
dependent) 


•  non-uniform  bandwidth,  and  the  bandwidths  increase  with 
increasing  frequency 


Source:  Zwicker  &  Fasti  "Psychoacoustics  Facts  and  helicotremo 

Models" 
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Threshold  in  Quiet  or  the  Absolute  Threshold 

•  Hearing  threshold  of  1 00  persons  with  normal  hearing  for  sine  tones 
(50%  curve  is  the  median) 


a  *  ■  *  Stuttgart. 

•  Approximations: 

LH  = 364  iikY0'8  -  (-0-6  (£  - 3-3)2)  + 10'3  (-£)* 

4  kHz  Signal  with  Amplitude  ±  1  LSB  (16  bit ) 

Source:  U.  Zolzer,  “Digital  Audio  Signal  Processing” 
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Critical  Bands: 
Bark  Scale 


-  Critical-band  concept 
used  in  many  models  and 
hypothesis, 

-  unit  was  defined  leading 
to  so-called  critical-band 
rate  scale 

•  scale  ranging  from  0  - 
24,  unit  “Bark” 


•  relation  between  z  and  f 
is  important  for  under¬ 
standing  many  character¬ 
istics  of  human  ear 


f/Bark 

/u/Ha 

/o/Hz 

/m/H* 

0 

0 

100 

100 

50 

1 

100 

200 

100 

150 

2 

200 

300 

100 

250 

3 

300 

400 

100 

350 

4 

400 

510 

110 

450 

5 

510 

630 

120 

570 

6 

630 

770 

140 

700 

7 

770 

920 

150 

640 

8 

020 

1080 

160 

1000 

0 

1080 

1270 

190 

1170 

10 

1270 

1460 

210 

1370 

11 

1480 

1720 

240 

1600 

12 

1720 

2000 

280 

1850 

13 

2000 

2320 

320 

2150 

14 

2320 

2700 

360 

2500 

15 

2700 

3150 

450 

2900 

16 

3150 

3700 

550 

3400 

17 

3700 

4400 

700 

4000 

18 

4400 

5300 

900 

4800 

19 

5300 

6400 

1100 

5800 

20 

6400 

7700 

1300 

7000 

21 

7700 

0500 

1800 

8500 

22 

9500 

1200(3 

2500 

10500 

23 

24 

12000 

15500 

15500 

3500 

13500 
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Critical  Bands:  Bark  Scale 


•  Critical-band  concept  used  in  many  models  and  hypotheses 
->unit  was  defined  leading  to  so-called  critical-band  rate  scale 

•  scale  ranging  from  0  -  24,  unit  "Bark"  (after  Zwicker) 

•  One  Bark  corresponds  to  one  critical  band 

•  Attempt  to  approximate  critical  bands  with  formulas: 


Critical  Band  rate  z: 

z 


Bark 
Critical  Bandwidth : 


=  13  arctanf  0.76 


/ 


kHz 


+  3.5arctan 


( 


f 


\7.5  kHz 


A/fl  =  25  +  75  1  +  1.4 


0.69 
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Tonality  (1) 


•  Tonality  index  a: 

-  noisy  signal:  a  =  0 

-  tonal  signal:  a  =  1 

•  System  theory 

-  Sharp  spectral  lines  =  Signal  is  periodic 
=  Signal  is  predictable 

-  Approximation:  If  the  signal  is 
predictable  then  it  should  be  periodic 

-  Therefore  we  can  use  prediction  to 
approximate  if  a  signal  is  tonal  (by 
periodicity) 
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Tonality  (2) 


Example: 

•  calculate  the  predictability: 

(.  n  -Kt,/))2  ($(t,/)  -  c  -»  1  :  noisy  signal 

~  ~Jyz  '  0^  Yyz  c  ->  0  :  tonal  signal 

■  If  c(t,/)  >  l  set  it  to  1  ->  a(t,f )  =  |c(t,/)  -  1| 

amplitude  of  a  spectral  line  in  time  and  frequency  r{t,  /) 
phase  of  a  spectral  line  in  time  and  frequency 
predicted  values  f(l,k)  for  amplitude  and  <f>(i,/c)  for  phase 

r(t,f )  =  r(t  -  1 ,/)  +  (r(t  -  1,/)  -  r(t  -  2,/)) 

=  <t(t  -  1,/)  +  (o(t  -  l,/)  -  <Kt  -  2,/)) 
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-  Spreading  Function 


L  [dB]  | 

Ls(f)  | 

fit ...  lower  frequency  limit 
fu. ...  upper  frequency  limit 
of  Critical  Band  i 

Power  Spectral  Density 
of  signal  x(t) 

sP(n  =  x2R(o+x?(o 

Sound  Pressure  Level  _ 

Ls(f)  =  10  l°g  10  Sp(f) 


Simultaneous 


Source:  U.  Zolzer,  "Digital  Audio  Signal 
Processing" 
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Calculating  the  Masking  Threshold 


Comparison  of  the  signal  level  to  Masking  Threshold 


0r(i ) 
dB 


=  a(  14.5  +  £)  +  (!  —  a)av 


=  —  2  —  2.05arctan  ( 


/ 


\4  kHz , 


—  0.75  arctan 


f 


2,56  kHz2 


a  ...  T onality  Index 
av  ...  Noise  Coefficient 


Approximation  (a=l:  tonal): 

of  (0 

— — —  =  a(  14.5  +  0  +  (1  —  a)5.5 


Different  Masking  with  different  maskers: 

-  Tone  masking:  (14.5  +  i)  dB,  where  /is 
the  integer  number  for  the  critical  band 

-  Noise  as  a  masker:  5.5  dB 


Simultaneous  Masking 
Threshold  (Power) 

r.  n  rv^i/in  LTH  ...Sound  Pressure  Level 

T(f)  =  10  s  f  Of  (i)  ...  Distance  to  Masking  Threshold 
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In-Band  Masking 
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Masking  Neighboring  Bands 


-  spread  of  masking  due  to  the  non-linearity  of  auditory  filters 

-  resulting  masking  threshold  =  sum  of  power  of  neighboring  spreading  functions 

-  here:  value  at  intersection  of  neighboring  spreading  functions  taken 


(24+0.23( 


_L_ 

kHz 


-1 

)  -0.2 


Ls{f K  dB 
dB  Bark 
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Masking  Neighboring  Bands 
Non-Linear  Superposition 

•  The  total  Masking  Threshold  of  the  ear  results  from  non-linear  superposition. 

•  resulting  masking  threshold  =  sum  of  fractional  power  of  neighbouring 
spreading  functions 

•According  to  Frank  Baumgarte,  Charalampos  Ferekidis,  Hendrik  Fuchs:  “A 
Nonlinear  Psychoacoustic  Model  Applied  to  the  ISO  MPEG  Layer  3  Coder”,  99th 
AES  Convention,  October  1,  1995. 

ftp://mpeg.tnt.uni-hannover.de/pub/papers/1995/AES99-FB.ps.gz 

and 

•R.  A.  Lutfi.  "A  Power-Law  Transformation  Predicting  Masking  by  Sounds  with 
Complex  Spectra”.  J.  Acoust.  Soc.  Am.  77  (6),  June  1985. 

•  With  IT  k  the  intensity  of  the  k'th  speading  function  (with  the  "intensity”  acting 
like  a  power),  and  a  suitable  parameter  ”a”  we  get  the  intensity  of  the  total 
masking  threshold  as 


IT(Z,)  =  ['HlrIT,k(Zi)a] 


1  la 


•According  to  the  references,  a=0.3  is  in  good  agreemend  with  psycho-acoustics 
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Python  Example,  Spreading  Function 

•  This  Python  example  shows  the  non-linear  superposition  with  parameter 
2*a=alpha=0.6,  in  the  Bark  scale.  We  construct  a  matrix  which  does 
the  actual  superposition  in  the  Bark  domain,  because  that  is  most 
efficient: 


def  spreadingf unct ionmat  (maxfreq,  nfilts,  alpha)  : 

#Arguments:  maxfreq:  half  the  sampling  frequency 
#nfilts:  Number  of  subbands  in  the  Bark  domain,  for  instance  64 
fadB=  14.5+12  #  Simultaneous  masking  for  tones  at  Bark  band  12 
fbdb=7 . 5  #  Upper  slope  of  spreading  function 


fbbdb=26.0  #  Lower  slope  of  spreading  function 
maxbark=hz2bark (maxfreq) 

spreadingf  unct  ionBarkdB=np  .  zeros  ( 2  *nfilt  s ) 

#upper  slope,  fbdB  attenuation  per  Bark,  over  maxbark  Bark  (full  frequency  range), 
with  fadB  dB  simultaneous  masking: 

spreadingfunctionBarkdB [0 : nfilts] =np . linspace ( -maxbark* fbdb, -2.5, nfilts) -fadB 

flower  slope  fbbdb  attenuation  per  Bark,  over  maxbark  Bark  (full  frequency  range) : 

spreadingfunctionBarkdB  [nfilts  :  2*nfilts]  =np .  linspace  (0,  -maxbark* fbbdb,  nfilts)  -fadB 

fConvert  from  dB  to  "voltage"  and  include  alpha  exponent 

spreadingfunctionBarkVoltage=10 . 0** (spreadingfunctionBarkdB/ 20 . 0*alpha) 

fSpreading  functions  for  all  bark  scale  bands  in  a  matrix: 
spreadingf uncmatrix=np  .  zeros  (  (nfilts,  nfilts)  ) 
for  k  in  range  (nfilts )  : 

spreadingf uncmatrix [ : , k] =spreadingfunctionBarkVoltage [ (nfilts-k) : (2*nfilts-k) ] 

return  spreadingfuncmatrix 

14 
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Python  Example,  Spreading  Function 


•  The  application  ot  the  spreading  function  is  then  a  simple  matrix 
multiplication  (which  avoids  slow  “for"  loops)  in  the  Bark  domain,  as  in 
the  following  Python  function: 

def  maskingThresholdBark(mXbark, spreadingfuncmat rix, alpha) : 

#Computes  the  masking  threshold  on  the  Bark  scale  with  non-linear  superposition 
#usage :  mTbark=maskingThresholdBark(mXbark, spreadingfuncmat rix, alpha) 

#Arg:  mXbark:  magnitude  of  FFT  spectrum, 

#spreadingfuncmat rix:  spreading  function  matrix  from  function  spreadingfunctionmat 
#alpha:  exponent  for  non-linear  superposition  (eg.  0.6) 

#return:  masking  threshold  as  "voltage"  on  Bark  scale 

#mXbark:  is  the  magnitude-spectrum  mapped  to  the  Bark  scale, 

#mTbark:  is  the  resulting  Masking  Threshold  in  the  Bark  scale,  whose  components  are 
#sqrt(I_tk)  on  page  13. 

mTbark=np . dot (mXbark**alpha ,  spreadingfuncmat  rix) 

#apply  the  inverse  exponent  to  the  result: 

mTba  rk=mTba  rk** ( 1 . 0/alpha ) 

return  mTbark 
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Python  Example,  Spreading  Function 


•  We  can  take  a  look  at  the  resulting  spreading  function  matrix  with: 

from  psyacmodel  import  * 

import  matplotlib. pyplot  as  pit 

fs=320O0  #  sampling  frequency  of  audio  signal 

maxf req=f s/2 

alpha=0.6  #Exponent  for  non-linear  superposition  of  spreading  functions 
nfilts=64  #number  of  subbands  in  the  bark  domain 

spreadingfuncmatrix=spreadingfunctionmat (maxf req, nfilts, alpha) 

pit .imshow(spreadingfuncmatrix) 

pit .title( 'Matrix  spreadingfuncmat rix  as  Image1) 

pit .xlabel ( 1  Bark  Domain  Subbands1) 

pit .ylabel( 1  Bark  Domain  Subbands1) 

pit . show( ) 
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Python  Example,  Spreading  Function 
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Masking  Neighboring  Bands 
Non-Linear  Superposition 

•  Observe  that  we  don't  need  any  tonality  estimation  for  this  model 

•  Usually  our  signal  from  the  filter  bank  is  like  a  “voltage",  not  like  a 
power  as  in  this  model. 

•  We  obtain  a  "power"  if  we  square  our  signal. 

•  Hence  our  exponent  is  multiplied  by  a  factor  of  2. 

•  We  get  a  ->  2*a,  hence  our  exponent  becomes  0.6. 

•  Observe:  The  frequency  index  is  on  the  Bark-scale,  as  can  be  seen  in 
slide  12 

•  Hence  we  need  a  mapping  from  Hertz  to  Bark,  from  our  linear  filter 
bank  scale  to  the  bark  scale,  where  we  apply  masking. 

•  Then  we  need  an  inverse  mapping,  from  Bark  to  Hertz,  to  apply 
our  found  masking  threshold  to  the  quantization  stepsizes  of  our 
linearly  spaced  subbands. 
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Bark  Scale  Approximations 


•  There  are  several  functional  approximations  of  the  Bark  scale  for  this 
mapping. 

•  An  example  of  an  overview  can  be  seen  in 

https://ccrma.stanford.edu/courses/120-fall-2003/lecture-5.html 

•  The  approximation  we  previously  saw  is  from: 

Zwicker  &  Terhardt  (1980),  "Analytical  expressions  for  critical-band  rate  and 
critical  bandwidth  as  a  function  of  frequency",  Article  in  The  Journal  of  the 
Acoustical  Society  of  America  68(5):1523  ■  November  1980 

•https://www.researchgate.net/publication/209436182_Analytical_expressions_for_c 

ritical-band  rate_and_critical_bandwidth_as_a_function_of_frequency 

•  Also  in  Wikipedia 

•  In  Python  notation,  the  approximation  is,  with  f  in  Herz  and  z  in  Bark: 

•  z=13*arctan(0.00076*f)+3.5*arctan((f/7500.0)**2) 

•  It  only  has  an  approximate  closed  form  inverse  formula,  according  to 
http://www.auditory.org/postings/1995/34.html: 

•f=  (((exp(0.219*z)/352.0)+0.1)*z-0.032*exp(-0.15*(z-5)**2))*1000 
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Bark  Scale  Approximations,  Zwicker&Terhard 


•  We  can  test  the  Zwicker  &  Terhard  approximation  in  ipython: 

ipython  -pylab 

#Frequency  array  between  0  and  20000  Hz  in  1000  steps: 
f =1 in space (0, 20000, 1000) 

#Computation  of  Zwickers  Bark  approximation  formula: 

z=13*arctan (0 . 00076*f ) +3 . 5*arctan ( (f /7500 . 0) **2) 

#plot  Bark  over  Hertz: 
plot ( f , z ) 

xlabel ( 1  Frequency  in  Hertz') 

ylabel ( 1  Frequency  in  Bark') 

title ( 1 Zwicker&Terhard  Approximation  1 ) 
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Bark  Scale  Approximations,  Zwicker&Terhard 
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Bark  Scale  Approximations,  Zwicker&Terhard, 

Inverse 

•  We  can  test  the  Zwicker  &  Terhard  inverse  approximation  in  ipython: 

ipython  -pylab 

#Frequency  array  between  0  and  20000  Hz  in  1000  steps: 
f =1 in space (0, 20000, 1000) 

#Computation  of  Zwickers  Bark  approximation  formula: 

z=13*arctan (0 . 00076*f ) +3 . 5*arctan ( (f /7500 . 0) **2) 

#computation  of  the  approximate  inverse,  free:  reconstructed  freq. : 

frec=  ( ( (exp (0 . 219*z) /352.0)+0.1) *z-0 . 032*exp (-0 . 15* (z-5) **2) ) *1000 

#plot  reconstructed  freq.  Over  original  freq: 
plot  (f, free) 

#comparison:  identity: 
plot ( f , f ) 

xlabel (' Frequency  in  Hertz') 
ylabel ( 1  Frequency  in  Hertz') 

title (' Zwicker&Terhard  Forward  and  Inverse  Approximation') 
legend ((' Zwicker  Forward  and  Inverse', 'Identity')) 
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Bark  Scale  Approximations,  Zwicker&Terhard 

Inverse 


Zwicker&Terhard  Forward  and  Inverse  Approximation 
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Bark  Scale  Approximations,  Traunmueller 


•  Traunmueller-formula,  1990,  from: 

•  Traunmuller,  H.  (1990).  "Analytical  expressions  for  the  tonotopic  sensory  scale". 
The  Journal  of  the  Acoustical  Society  of  America. 

•  Also  in  Wikipedia: 

•  In  Python  notation,  the  approximation  is,  with  f  in  Herz  and  z  in  Bark: 

•  for  above  200  Hz: 

z=26.81*f/(1960.0+f)-0.53 

•  below  200Hz: 

z=  f/102.9 

•  It  has  an  exact  inverse: 

•  Above  200  Hz: 

f  =1960. 0/(26. 81/(z+0.53)-l) 

•  Below  200  Hz: 

f=z*102.9 
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Bark  Scale  Approximations,  Schroder 


•  -Schroeder,  M.  R.  (1977).  Recognition  of  Complex,  Acoustic 

•Signal  &  Life  Sciences  Research  Report  5,  edited  by  T.  H.  Bullock  (Abakon  Verlag, 
Berlin),  p.  324. 

•  See  also:  "Perceptual  linear  predictive  (PLP)  analysis  of  speech"  by  Hynek 
Hermansky,  J.  AcousL  Soc.  Am.  87  (4).  April  1990, 
(http://seed.ucsd.edU/mediawiki/images/5/5c/PLP.pdf) 

•  It  is  eq.  (3),  for  angular  frequency,  which  is  in  turn  from  Schroeder  above 

•  Also  used  in  the  PEAQ  standard  for  objective  quality  estimation  (eq.  (2)  in  the 
paper: 

•  https://www.ee.columbia.edu/~dpwe/papers/ThiedeOO-PEAQ.pdf 

•  "PEAQ-The  ITU  Standard  for  Objective  Measurement  of  Perceived  Audio  Quality", 
THILO  THIEDE  et  al.,  J.  Audio  Eng.  Soc.,  Vol.  48,  No. 1/2,  2000  January/February 

•  It  is  the  simplest  Approximation: 

z=  6*arcsinh(f/600.0) 

•It  has  an  exact  inverse,  Bark  to  Hertz: 
f=600  *  sinh(z/6.0) 
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Bark  Scale  Approximations,  Comparisons 


•  Comparison  of  our  functional  approximation  with  our  Bark-Table. 

•The  approximation  formulas  also  give  fractional  Bark  numbers,  and  the  integer 
Bark  numbers  correspond  to  unique  frequencies,  which  are  a  band  limit. 

•Tables  name  bands  after  an  integer  Bark  number,  but  they  differ  in  if  the  band 
above  or  below  is  named  after  that  number. 

•In  the  lecture  table  this  integer  Bark  number  corresponds  to  the  lower  limit  of  the 
band,  hence  it  starts  with  index  0,  in  other  literature  (CCRMA  Webpage)  and 
Wikipedia  to  the  upper  limit,  starting  with  index  1! 

•We  use  these  pairs  out  of  the  table  for  our  comparison: 

•1  bark  -  100Hz 
•10  Bark  -  1270Hz 
•15  -  2700  Hz 
•20  -  6400  Hz 
•22  -  9500  Hz 
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Bark  Scale  Approximations,  Comparisons 


•  Use  ipython  for  the  comparison: 

ipython  — pylab 
f=arange (0, 20000, 10) 

z=26 . 81*f / (1960 . 0+f ) -0 . 53  #Traunmueller 

plot  (f,z) 

z=  6*arcsinh (f /600 . 0)  #Schroeder 

plot  ( f , z ) 

z=13*arctan (0 . 00076*f ) +3 . 5*arctan ((f/7500.0)**2)  #Zwicker 

plot  ( f , z ) 

legend ( ( 1 Traunmueller 1 ,  1 Schroeder 1 ,  1 Zwicker 1 ) ) 

#plot  single  comparison  points: 

plot ([100,1270, 2700, 6400, 9500, 15500], [1, 10, 15, 20, 22, 24] , 'ro') 

xlabel (' Frequency  (Hz) ') 
ylabel ( ' Bark  1 ) 

tit le (' Approximations  of  the  Bark  Scale') 
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Bark  Scale  Approximations,  Comparisons 


Approximations  of  the  Bark  Scale 
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Bark  Scale  Approximations 


•  Observe:  The  Zwicker  approximation  is  the  most  precise,  it  hits  our  test  points, 
but  it  has  no  closed  form  inverse. 

•The  Schroeder  approximation  is  the  least  accurate,  but  it  is  the  simplest  and  it  has 

an  exact  inverse  in  closed  form,  hence  it  is  used  most  often,  and  we  will  also  use  it. 

• 

•In  Python  we  use  the  function: 

def  hz2bark(f ) : 

.  Method  to  compute  Bark  from  Hz.  Based  on  : 

https : //git hub . com/ stephencwelch/Perceptual- Coding -In -Python 
Args  : 

f  :  (ndarray)  Array  containing  frequencies  in  Hz. 

Returns  : 

Brk  :  (ndarray)  Array  containing  Bark  scaled  values. 

n  ii  ii 

Brk  =  6.  *  np.arcsinh(f/600. ) 

return  Brk 
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Bark  Scale  Approximations 


•  The  inverse  function  in  Python  is: 

def  bark2hz (Brk) : 

.  Method  to  compute  Hz  from  Bark  scale.  Based  on  : 

https : //git hub . com/ stephencwelch/Perceptual- Coding -In -Python 
Args  : 

Brk  :  (ndarray)  Array  containing  Bark  scaled  values. 

Returns  : 

Fhz  :  (ndarray)  Array  containing  frequencies  in  Hz. 

ii  ii  ii 

Fhz  =  600.  *  np. sinh(Brk/6. ) 


return  Fhz 
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Bark  Scale  Mapping 

•We  choose  64  subbands  in  the  Bark  scale,  hence  each  about  1/3  Bark  wide. 

•In  Python  we  construct  a  matrix  W  for  this  mapping  (again,  to  avoid  slow  “for” 
loops),  which  has  l's  at  the  position  of  each  such  1/3  Bark  band: 

def  mapping2barkmat (fs) : 

Constructing  matrix  W  which  has  l's  for  each  Bark  subband,  and  0's  else: 
nfft=2048;  nfilts=64;  nf reqs=nfft/2 

#the  linspace  produces  an  array  with  the  fft  band  edges: 
binbarks  =  hz2bark(np.linspace(0, (nfft/2) , (nf ft/2)+l) *f s/nf ft ) 

W  =  np . zeros ( (nfilts ,  nfft)) 
for  i  in  xrange(nfilts) : 

W[i,0: (nfft/2)+l]  =  (np. round (binbarks/step_barks)==  i) 
return  W 
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Bark  Scale  Mapping 

•Matrix  W  as  image  in  Python: 


f s=32000 


W=mapping2barkmat (f s) 
pit . imshow(W[ : , : 256] , cmap=' Blues ' ) 
pit . title( 1  Mat rix  W  as  Image1) 
pit .xlabel( 1  Uniform  Subbands1) 
pit .ylabel( 1  Bark  Subbands1) 

plt.show()  r  , 

Matrix  W  for  Uniform  to  Bark  Mapping  as  Image 


Prof.  Dr.-lng.  K.  Brandenburg,  bdg@idmt.fraunhofer.de  Prof.  Dr.-lng.  G.  Schuller,  shl@idmt.fraunhofer.de _ 

Ctv 

©  Fraunhofer  IDMT  TECHNISCHE  UNIVERSITAT 

ILMENAU 


32 


Fraunhofer 


IDMT 


Bark  Scale  Mapping 


•For  each  such  1/3  bark  subband  we  add  the  signal  powers  from  the  corresponding 
DFT  bands. 

•Then  we  take  the  square  root  to  obtain  a  “voltage"  again. 

•As  Python  function: 


def  mapping2bark(mX,W) : 

#Maps  (warps)  magnitude  spectrum  vector  mX  from  DFT  to  the  Bark  scale 
#returns:  mXbark,  magnitude  mapped  to  the  Bark  scale 
#Frequency  of  each  FFT  bin  in  Bark,  in  1025  frequency  bands  (from  call) 
nfft=2048;  nfilts=64;  nf reqs=nf ft/2 

#Frequencies  of  each  FFT  band,  up  to  Nyquits  frequency,  converted  to  Bark: 
#Here  is  the  actual  mapping,  suming  up  powers  and  conv.  back  to  Voltages: 

mXbark  =  (np.dot(  np.abs(mX[ :nf reqs] )**2.0,  W[:,  :nf reqs] .T) )**(0.5) 

return  mXbark 
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Mapping  from  Bark  scale  back  to  Linear 


•After  having  computed  the  masking  threshold  in  the  Bark  scale,  we  need  to  map  it 
back  to  the  linear  scale  of  our  filter  bank 

•For  that  we  need  to  "distribute"  the  corresponding  power  of  each  of  our  1/3  Bark 
bands  into  the  corresponding  filter  bank  bands  on  the  linear  frequency  scale 

•Then  we  take  the  square  root  to  obtain  a  "voltage"  again. 

•We  again  contruct  a  matrix  to  do  that  in  Python.  When  there  is  1  subband  in  the 
1/3  bark  scale,  it  gets  a  factor  1,  if  there  are  2  subbands,  they  get  a  factor  of 
sqrt(2),  and  so  on,  using  a  diagonal  matrix  multiplication  for  those  factors.  It  is  an 
64x1024  matrix: 

def  mappingf rombarkmat (W) : 

Constructing  matrix  W  inv  from  matrix  W  for  mapping  back  from  bark 
scale 

nfft=2048;  nf reqs=nfft/2 

W_inv=  np.dot (np.diag( (1.0/np. sum(W, 1) )**0.5) ,  W[:,0:nfreqs  +  1]).T 

return  W  inv 
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Mapping  from  Bark  scale  back  to  Linear 


Matrix  W_inv  for  Bark  to  Uniform  Mapping  as  Image 

o 


•Matrix  WJnv  as  image  in  python: 


W_inv=mappingf  romba  rkmat ( W) 

pit . imshow(W_inv[ : 256 , : ] , cmap='Blues ' ) 

pit .title( 'Matrix  Winv  as  Image') 

pit .xlabel (' Bark  Subbands') 

pit .ylabel( ' Uniform  Subbands') 

pit . show( ) 


0  50 

Bark  Subbands 
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Mapping  from  Bark  scale  back  to  Linear 


•The  function  for  mapping  the  masking  threshold  from  Bark  scale  to  linear  scale  is 

def  mappingf rombark(mTbark,W_inv) : 

#Maps  (warps)  magnitude  spectrum  vector  mTbark  in  the  Bark  scale 
#  back  to  the  linear  scale 

#returns:  mT,  masking  threshold  in  the  linear  scale 
nfft=2048;  nf reqs=nfft/2 

mT  =  np. dot (mTbark,  W_inv[:,  :nfreqs].T) 

return  mT 
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Hearing  Threshold  in  Quiet 


•On  top  of  our  signal  adaptive  masking  threshold,  we  have  the  threshold  in  quiet. 
•We  have  an  approximation  formula  from  Zoelzer:  “Digital  Audio  Signal  Processing” 

•  For  the  case  of  quiet  and  only  a  barely  audible  test  tone. 

•  The  approximation  for  this  Level  of  the  Threshold  in  Quiet,  LTQ,  in  dB  and  in 
Python  notation  is: 

LTQ=3.64  *  (f/1000.)  **(-0.8)  -  6.5*np.exp(  -0.6  *  (f/1000.  -  3.3)  **  2.)  + 
le-3*((f/1000.)  **  4.) 

•Plot  it  with  ipython: 

ipython  --pylab 
f =1 in space (20 , 20000, 1000) 

LTQ=3. 64* (f/1000.) **-0.8  -6 . 5*np . exp (-0 . 6* (f /1000 . -3 . 3) **2 . ) +le- 
3*(  (f/1000. )**4.) 

semilogx (f , LTQ) 
axis ( [20 , 20000 ,  -20,80]) 
xlabel ( ' Frequency/Hz '  ) 
ylabel ( ' dB ' ) 

title (’ Approx .  Function  for  Masking  Threshold  in  Quiet’) 
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Hearing  Threshold  in  Quiet 
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Hearing  Threshold  in  Quiet 


•The  dB  of  the  formula  is  for  sound  pressure.  Our  internal  representation  has  +-1 
as  a  full  scale,  which  corresponds  to  0  dB.  Assume  we  play  back  our  audio  signal 
such  that  full  scale  appears  at  a  sound  level  of  speech,  which  is  about  60  dB. 

Hence  to  convert  the  sound  level  to  our  internal  representation,  we  need  to 
reduce  the  threshold  of  quiet  by  60  dB. 

•Even  with  an  audio  signal  the  masking  threshold  in  quiet  still  matters  at  the  lowest 
and  highest  frequencies. 

•We  combine  the  signal  dependent  masking  threshold  and  the  threshold  in  quiet 
by  taking  the  maximum  of  the  two  at  each  frequency. 

•In  Python  we  clip  the  result  to  avoid  overloading  and  numerical  problems,  and 
correct  our  masking  threshold  with: 

LTQ=np. clip(LTQ, -20,60) 

#Shift  dB  according  to  our  internal  representation: 

LTQ=LTQ-60 
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Hearing  Threshold  in  Quiet,  Testing 


•  We  can  test  our  approximation  formula  for  our  hearing  threshold  in  quiet  by 
producing  noise  below  this  spectral  threshold,  and  then  listen  to  it.  If  we  don't 
hear  the  noise  it  works! 

We  can  use  the  Python  function  in  our  program  maskinginquietdemo.py  (see  our 
Moodle  page): 

noisef romdBSpectrum ( spec, fs) 

•With:  spec:  spectral  shape  of  the  produced  noise  in  dB,  fs:  sampling  rate 

•Then  we  can  listen  to  the  sound  corresponing  to  our  threshold  approximation  with 
with: 

f=np . 1 inspace ( 0 , f s/2 , N) 

LTQ=np . clip ( (3.64* (f/1000.) **-0.8  -6 . 5*np . exp (-0 . 6* (f/1000.- 
3.3) **2 . ) +le-3* ( (f/1000 .) **4 .)),  -20,  60) 

#Shift  dB  according  to  our  internal  representation: 

LTQ=LTQ-60 

#Play  back  noise  shaped  like  the  masking  theshold  in  quoet : 
noisef romdBSpectrum (LTQ,  fs) 
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Hearing  Threshold  in  Quiet,  Testing 


•  We  can  start  the  complete  demo  with: 

•Python  maskinginquietdemo . py 


•Observe:  White  noise  (flat  spectrum)  is  clearly  audible 

•Noise  shaped  according  to  our  threshold  approximation  should  be  inaudible! 
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The  Complete  Psycho-Acoustic  Model 


•Now  our  complete  psycho-acoustic  model  for  the  computation  of  our  masking 
threshold  is: 
f s=32000 

W=mapping2barkmat (f s) 

W_inv=mappingf rombarkmat (W) 

def  maskingThreshold (mX,  W,  W_inv,fs): 

#Input:  magnitude  spectrum  of  a  DFT  of  size  2048 

#Returns:  masking  threshold  (as  voltage)  for  its  first  1025  subbands 

#Map  magnitude  spectrum  to  1/3  Bark  bands: 

mXbark=mapping2bark(mX,W) 

#Compute  the  masking  threshold  in  the  Bark  domain: 

mTbark=maskingThresholdBark(mXbark) 

#Map  back  from  the  Bark  domain, 

#Result  is  the  masking  threshold  in  the  linear  domain: 
mT=mappingf rombark(mTbark,W_inv) 

#Threshold  in  quiet: 
f=np.linspace(0,f s/2, 1025) 

LTQ=np.min( (3.64*(f/1000. )**-0.8  -6.5*np.exp(-0.6*(f/1000. -3.3)**2. )+le- 
3*( (f/1000. )**4. ) ,50*np.ones(len(f ) ) ) f 0) 
mT=np.max( (mT,  10.0**( (LTQ-60)/20) ) ,0) 


return  mT 


42 


Prof.  Dr.-In g  K.  Brande n b u rg,  bdg@idmt.fraunhofer.de  Prof.  Dr.- 1 n g  G.  Schuller,  shl @ idmt.fraun h ofer.de 


Fraunhofer 


©  Fraunhofer  IDMT 


TECHNISCHE  UNIVERSITAT 

ILMENAU 


IDMT 


The  Complete  Psycho-Acoustic  Model 


*Example  for  our  complete  psycho-acoustic  model  : 

from  psyacmodel  import  * 
fs=32000  #sampling  rate  in  Hz 

W=mapping2barkmat (fs)  #Compuatuionn  of  mapping  to  Bark  matrix 
W_inv=mappingf rombarkmat (W)  Computation  of  Bark  to  linear  matrix 

mX=np.linspace(10,0, 1024)  #Example  magnitude  spectrum 

mT=maskingThreshold(mX,  W,  W_inv,fs) 

pit . plot (mT) 

pit . title( 1  Masking  Theshold  including  Threshold  in  Quiet1) 
pit . plot (mX) 

pit . legend (( 1  Masking  Trheshold1,  'Magnitude  Spectrum1)) 

pit .xlabel( 1 FFT  subband1) 

pit . ylabel ( "Magnitude  ( 1  Voltage 1 ) " ) 

pit . show( ) 
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The  Complete  Psycho-Acoustic  Model 
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The  Complete  Psycho-Acoustic  Model 

This  example  is  an  idealized  tone  in  one  subband,  and  its 
resulting  masking  threshold,  which  is  mostly  its  spreading 
function: 
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Masking  Theshold  including  Threshold  in  Quiet 


1  1 

Masking  Trheshold 
Magnitude  Spectrum 

- 

The  Complete  Psycho-Acoustic  Model 


*Example,  complete  demo: 

python  psyacmodel . py 

•Real-Time  Audio  Demo: 
python  psycho-acoustic-modelDFTgs . py 

•Try  different  inputs: 

•Silence,  to  see  the  threshold  in  quiet. 

•A  tone,  to  see  its  spreading  function. 

•A  complex  music  signal,  to  see  its  masking  threshold. 

•Observe:  here  we  can  use  music  or  a  sinusoidal  tone  of  1  kHz  frequency  as  input, 
and  shift  th  masking  threshold  in  the  dB  domain  to  find  the  precise  threshold  at 
which  the  added  noise  becomes  inauddible. 
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Physical  Models  of  Hearing 


•Physical  models  doen't  model  the  effects  of  hearing,  but  the  physical 
functioning  of  the  inner  ear  instead. 

•As  a  result,  their  output  is  an  internal  representation,  not  a  masking  threshold 
•But  they  can  still  be  used  to  compute  a  similarity  measure  of  2  different  sounds, 
as  perceived  by  the  human  ear,  by  comparing  their  internal  representations. 

•An  example  is  the  "PEMO-Q"  measure,  to  estimate  the  “quality”  of  a  sound 
compared  to  an  original. 

•[1]:  http://ieeexplore.ieee.org/Xplore/login.jsp?url  =  http%3A%2F%2Fieeexplore.ieee.org 
%2Fiel5%2F10376%2F36074%2F01709880.pdf&authDecision=-203 

•It  is  used  as  part  of  "PEASS”  toolkit. 
•(https://hal.inria.fr/inria-00567152/PDF/emiya2011.pdf) 

•This  is  used  for  instance  for  measuring  the  quality  of  audio  source  separtion. 
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Physical  Models  of  Hearing,  PEMO  Model 


audio  signal 
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next  lecture: 

09.11.  -  Quantization  and  Coding 
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